Decoding Cult Classics: What Separates the Films That Stick?
In a time when data science is often used to tackle heavy societal issues like healthcare, economics, climate, and inequality, we decided to take a different route. We wanted to remind ourselves that creativity and curiosity also have a place in analytics. We chose movies, a fun and creative domain, and approached them with a serious analytical mindset. Movies are inherently unquantifiable. Their interpretations are subjective, emotionally driven, and culturally dependent. To us, this is exactly what makes the challenge interesting. The differences in our project were not only in the questions we asked but also in the type of data we worked with. Our datasets involved messy concepts such as cult status, genre identity, and fandom behavior, which are very different from typical structured and numerical data. Our project sits at the intersection of AI, culture, and entertainment analytics. We followed standard data science practices such as data collection, cleaning, and visualization, and combined them with large language model based labeling to analyze a concept that does not naturally exist in structured form.
Cult classics are films with particularly dedicated fanbases and cultural staying power. Throughout the semester, we aimed to determine first, if there are measurable characteristics that distinguish cult classics from other films, and secondly, whether or not we can measure them. This question is significant because existing scholarship on cult cinema largely emphasizes qualitative dimensions such as transgression, niche audiences, and subcultural identity, while offering limited quantitative analysis of film attributes (see Mathijs and Mendik, The Cult Film Reader). Since it did not previously exist, our first step was to create a cult status indicator variable with the assistance of generative AI, that would be the basis for our empirical investigation. The next phase of our project employed traditional data science methods to explore trends in thematic key words, genres, and popularity over time for cult and non-cult films. The last phase, rather ambitiously, implemented statistical machine learning tools to model and attempt to predict future cult status for recent films.
Data
We work with two main preexisting data sources: The Movie Database (TMDb) and the Internet Movie Database (IMDb). Our TMDb dataset contains one row per movie for the 10,000 highest rated films by TMDb user score and includes fields such as title, release date, budget, revenue, average vote, etc. We also attach an LLM-generated binary label (cult = 1/0) derived from a prompt-based cult score and a chosen threshold. Cleaning for TMDb focuses on standardizing identifiers and dates, removing duplicates, handling implicit missingness, and normalizing text fields, so titles match consistently across sources.
IMDb provides two types of information that TMDb does not, audience activity over time and credit metadata. For ratings activity, we use an IMDb table for votes over time with fields like ID number, year, number of ratings, and a timestamp for each vote. This data powers our time series curves and the derived cult trajectory metrics. Cleaning on the IMDb side includes filtering to titles that appear in our TMDb list, coercing years to numeric, removing implausible years, aggregating multiple vote records to a single ID and year count when needed, and ensuring consistent ID formats. Finally, we join by matching cleaned titles and release years, then carry the IMDb/TMDb IDs forward so all downstream modeling and visualizations use a single consistent movie identifier.
As touched on in the introduction, there is not an indicator of cult status in any publicly available data. Since our goal was to explore and model characteristics of cult films, we needed a dependent variable. To remedy this we used the OpenAI API to feed chatgpt-4.1-mini the title and release date of all films in our dataset released before 2010 and asked for the structured output of a cult 1/0 indicator variable. To stabilize the variability inherent in generative AI responses, we ran the query five times with the same prompt, and selected films identified as cult classics in at least four out of five queries as what we would consider cult films.
Processes and outcomes
Our main question, “what makes a movie a cult classic?”, ended up being less about a single “secret ingredient” and more about a handful of measurable signals that often come together: what the movie is like, who it seems made for, and how audiences find it over time. Across the results, the strongest pattern is that cult status tends to emerge when a film is distinctive enough to attract a specific audience, and that audience has reasons to keep returning to it and recommending it.
Two descriptive visuals capture the content signature of cult films: the genre-proportion heatmap and the top keywords bar chart. In the heatmap, the cult-column is visibly darker for Horror, Thriller, and Comedy, while the non-cult absence column is dominated by Drama, a more mainstream “default” genre bucket. The very slight presence of Animation, Family, and War among cult classics is also notable, suggesting that in this dataset cult status rarely forms around films that are either aimed at a broader audience or tied to conventional prestige or historical framing. While genre alone does not define cult status, there is certainly correlation.


The keyword chart shows how cult films differ within or alongside genres. The most common keywords cluster around dark, transgressive, or subcultural themes. Importantly, some keywords aren’t just themes, they are signals of style and structure, like a stinger during or after credits, which point to films that reward insider viewing habits. The broader trend here is that cult classics aren’t just “good movies people like”, but they are often movies with strong identity markers that make it easy for fans to rally around and recommend to other fans.
These visuals also imply why cult status can feel intangible. Our measured features capture what is on screen, but cult status also depends on social transmission, viewing rituals, and timing.
The Shiny app lets us treat “cultness” as something you can see in the shape of audience attention over time, not just in a genre label. For any selected movie, it plots ratings activity by year and overlays a smoothed curve, then summarizes that curve with four interpretable metrics: Time-to-takeoff (T₅₀), the number of years it takes to reach 50% of lifetime ratings; peak-lag, the time from release to the highest point of attention; a long-tail ratio comparing ratings in years 5–15 to ratings in years 0–2; and decay half-life, how quickly attention falls after the peak. Together, these metrics reveal at least two common “pathways” a film can take. Some titles look like slow-burn / rediscovery cases: they reach T₅₀ late, peak years after release, and have long-tail ratios above 1, meaning a substantial portion of attention arrives well after the initial release window—consistent with word-of-mouth spread, rewatching, streaming-era rediscovery, or fandom growth over time. In the Pride & Prejudice (2005) example shown, the pattern is strongly late-life: T₅₀ = 11 years, peak-lag = 11 years, and a long-tail ratio ≈ 4.14, indicating that far more engagement happens years later than in the first two years. Other movies show the opposite mainstream spike profile: low T₅₀ and short peak-lag, with attention concentrated near release and then fading relatively quickly, which matches films that are heavily consumed during their initial marketing/theatrical moment. This time-horizon framing helps us move beyond “cult is a vibe” and toward measurable subtypes: even without explicitly modeling director reputation, the curves point to different mechanisms of visibility—launch-driven attention versus delayed, community-driven accumulation—that map closely onto how cult followings tend to form.
The Shiny app is included as an external qmd located in the same folder as this file in the Github repository.
These two plots scale up the Shiny-app idea from single movies to the whole dataset by summarizing the typical ratings trajectory for cult vs. non-cult films.
In the first figure, we align every movie by years since release and convert yearly rating counts into the share of that film’s lifetime ratings, so the comparison is about shape and timing rather than raw popularity. The result is a clear split: non-cult films are more front-loaded, with a sharp spike in the first 1–2 years and then a gradual decline, consistent with mainstream release cycles and early mass attention. Cult classics, in contrast, show a smaller initial spike but relatively higher activity later, especially in the mid-life window (roughly years 5–15) and a subtle late lift, suggesting a longer “afterlife” driven by rediscovery, niche communities, and repeated recommendation rather than a single launch moment.

The second figure makes that timing difference even easier to interpret by tracking how quickly ratings accumulate over a film’s life. The dashed vertical lines mark T50, the time it takes the average film to reach 50% of its lifetime ratings. Here, the non-cult curve reaches the halfway point earlier, while the cult curve hits T50 later, meaning cult attention builds more gradually over time. Together, these visuals reinforce the central pattern behind our curve metrics: cult classics are less dominated by early release period attention and more defined by a long tail / slow-burn accumulation, which matches what the Shiny app shows at the individual movie level, just averaged across thousands of titles. Because everything is expressed in shares and within a capped 0–20 year window, the takeaway is explicitly about relative timing, not absolute “how popular” a movie is.

The final test of the quantifiability of cult classics is to model and predict. To generate an outcome variable for this modeling, we prompted the OpenAI API in RStudio using single-shot reasoning with a structured output. We fed the LLM the titles and release dates of all films in our dataset released before 2010 and asked for a binary cult classification. To account for the variability inherent in generative AI responses, we ran the same query five times and aggregated the results such that our final cult classification included films identified as cult classics in at least four out of five of the queries. We then performed human verification of the LLM classification by manually classifying a random sample of 50 films. The manual and LLM classifications aligned on 40 out of 50 of the selected films, but we found that most of the disagreement cases were not clear-cut and could have gone either way.
Now confident in the LLM’s classifications, we proceeded to modeling. Our final mean model included average user scores, popularity, logged budget, logged return on investment (box office return divided by budget), runtime, and a vector of dummy variables for the genres that were found to be statistically significant and have relatively large sample sizes. After evaluating candidate models by running and comparing the results of out-of-sample validation, we landed on an ensemble model that combined the predictions of a parametric logistic regression model and a non-parametric random forest model. We then used this model to predict the probability of films released after 2009 becoming cult classics.

This visualization shows the 10 films released since 2010 with the highest and lowest probabilities of becoming cult classics plus a few additional films that serve as gut-checks. Note that this visualization excludes all films with “horror” as their primary genre, since, if included, these films would dominate the high-probability side. As we can see from the color of the points, films with the lowest probabilities tend to be blockbuster adventure and biopic films, while films with the highest probabilities tend to be obscure comedies. While some of the top films appear to be rather uninteresting, the top two are very promising. The film with the highest probability, Heartbeats, is about a French-Canadian love triangle, while the second film, Alps, is a Greek psychological drama art film. Both of these, at least at a glance, appear to have significant cult appeal. Additionally, Alps is directed by the popular director Yorgos Lanthimos. His status creates its own cult appeal, drawing fans back to his older and less popular works, like Alps. I would also like to note several other films that indicate that the model is functioning well. Avengers: Infinity War, despite critical acclaim, has a very low probability. This makes sense because it broke many box office records and could never be considered remotely “cult”. Films like The Holdovers, Portrait of a Lady on Fire, Good Time, and Bottoms all have subversive themes and cult appeal in their content, but are too well loved and too popular to have particularly high probabilities. While the formation of a cult around a piece of media remains intangible, this model at the very least does a good job of “faking it” or identifying what appear to be valid plausible and implausible candidates.
So what makes a cult classic based on these results? A cult classic is most often a movie with strong identity cues, niche target appeal that makes it easy for fans to find community, and an attention curve that supports that community-building. The “recipe” is real in the sense that these signals cluster, but cult status stays partly intangible because it ultimately requires collective adoption over time, which is why our trajectory metrics and prediction work complement each other. They measure how cultness happens, not just what the film is.
Limitations and Future Directions
A key limitation is that our “cult” outcome is only a placeholder for a social phenomenon that is partly cultural and community-driven. The LLM label (and therefore the model trained on it) may encode bias, especially toward horror or edgy keywords. And even after removing horror in one figure, the underlying signal can still privilege genres and tropes that sound cult-like or that people online describe as cult-like rather than films that have actually developed sustained fan communities. Similarly, our genre and keyword visuals summarize what is common, but they don’t prove causation. Genres and themes correlate with cult status, yet they may just be markers of niche marketing or availability rather than the reason a fandom formed. The ratings trajectory app helps by introducing a time dimension, but it also has blind spots. Ratings volume reflects who is on the platform and when, and a slow-burn curve could reflect streaming release timing, re-releases, awards, etc., rather than cult adoption. Finally, our visuals don’t directly measure the mechanisms people often associate with cult classics, such as public screening, quotability, online communities, controversy, or critical evaluation. So an open question is how much cult status is driven by social transmission and context versus the measurable content features we captured. These gaps point to next steps like incorporating glide paths/adoption curves in prediction models, separating theatrical vs streaming eras, and adding signals for fandom activity to better distinguish true cult formation from general late popularity.
Contributions
Eric
Led the analysis of genre- and keyword-based features used to characterize cult classic films. He was responsible for cleaning and structuring the nested genre and keyword data, developing the genre and keyword proportion heatmaps, and ensuring these analyses were reproducible through properly saved and loaded processed datasets. In addition to his analytical contributions, Eric coordinated and scheduled meetings with the external domain expert, helped streamline communication among team members, and translated group discussions into concrete analytical steps. He also contributed to multiple sections of the written report, refining methodological descriptions and results narratives to improve clarity and coherence.
Natalia
Led the analysis of ratings over time for individual movies and general patterns. Created shiny app with four different metrics to help identify if a movie is a slow-burn or mainstream, the first one being a better candidate to be considered a cult film. Created two plots to visualize patterns of movies classified as cults by Sebastian’s LLM vs. non-cults in terms of number of ratings over time (years since release). Created lists and outlines for task delegation, set meeting times with Google Calendar, and contributed to the final screencast slides for research questions, data description, limitations and open questions. Organized Github for final submission, created and organized qmd for final narrative. Wrote data description, interpretations for the shiny app and ratings plots, and Limitations for the final narrative.
Sebastian
Suggested the project topic. Found and led discussions with the domain expert. Queried the TMDB API to pull the bulk of the data used. Adapted code from Shilad’s Hallucination Detection activity to query ChatGPT using the OpenAI API and aggregate results to create the cult classifier outcome variable. Manually classified a random sample of 50 films and compared the results with the LLM classifications to verify their validity. Created and tuned an optimal mean model for the available data. Tested parametric and non-parametric models using out-of-sample validation and model selection/evaluation metrics to create a final ensemble model. Used the final model to predict future cult status for recent films. Created and presented a visualization of these predictions in the final presentation and written submission. Adapted final quarto document to allow for differences in file organization between group members by including png’s of data visualizations. Wrote portions of final written submissions pertaining to my work and suggested/implemented edits to other sections.